Skip to content

CMR-11272: Replace Clojurewerkz#2418

Open
jmaeng72 wants to merge 4 commits intoCMR-10369-es8-featurefrom
CMR-11272
Open

CMR-11272: Replace Clojurewerkz#2418
jmaeng72 wants to merge 4 commits intoCMR-10369-es8-featurefrom
CMR-11272

Conversation

@jmaeng72
Copy link
Copy Markdown
Contributor

@jmaeng72 jmaeng72 commented Apr 30, 2026

Overview

What is the objective?

Replace clojurewerkz because it does not work well with elastic 8 and updated versions of the dependency do not support 8

What are the changes?

Replace clojurewerkz with clj-http

What areas of the application does this impact?

All

Required Checklist

  • New and existing unit and int tests pass locally and remotely
  • clj-kondo has been run locally and all errors in changed files are corrected
  • I have commented my code, particularly in hard-to-understand areas
  • I have made changes to the documentation (if necessary)
  • My changes generate no new warnings

Additional Checklist

  • I have removed unnecessary/dead code and imports in files I have changed
  • I have cleaned up integration tests by doing one or more of the following:
    • migrated any are2 tests to are3 in files I have changed
    • de-duped, consolidated, removed dead int tests
    • transformed applicable int tests into unit tests
    • reduced number of system state resets by updating fixtures. Ex) (use-fixtures :each (ingest/reset-fixture {})) to be :once instead of :each

Summary by CodeRabbit

  • Refactor
    • Removed the Elastisch library dependency and replaced it with direct HTTP calls for Elasticsearch operations.
    • Improved error handling and response parsing for Elasticsearch queries and operations.
    • Simplified connection management with a new internal connection structure.

@jmaeng72 jmaeng72 self-assigned this Apr 30, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 30, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5e4b16e7-c565-4da2-93e0-d2c70ab4828e

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR removes the clojurewerkz.elastisch dependency across multiple CMR modules and replaces all Elasticsearch client calls with direct clj-http.client HTTP requests. A custom Connection record now holds the URI and HTTP options, and new helper functions parse JSON responses and construct Elasticsearch endpoint URLs consistently.

Changes

Cohort / File(s) Summary
Dependency Management
elastic-utils-lib/project.clj
Removes clojurewerkz/elastisch dependency and elevates ring/ring-jetty-adapter from profile-specific to default project dependencies.
ES Connection Layer
elastic-utils-lib/src/cmr/elastic_utils/connect.clj
Introduces Connection record with uri and http-opts fields; replaces elastisch client initialization and rewrites cluster health probing via direct GET /_cluster/health HTTP calls using clj-http.client.
ES Document & Index Operations
elastic-utils-lib/src/cmr/elastic_utils/es_helper.clj, elastic-utils-lib/src/cmr/elastic_utils/es_index_helper.clj
Removes elastisch REST document/index helpers and reimplements all ES operations (search, count-query, scroll, doc-get, put, delete, delete-by-query, delete-index, bulk, exists?, update-mapping, create, refresh, update-aliases, get-aliases, create-index-template, get-mapping, get-settings) with direct clj-http.client calls; adds internal JSON parsing and URL construction helpers.
Query & Search Abstractions
elastic-utils-lib/src/cmr/elastic_utils/search/es_wrapper.clj, elastic-utils-lib/src/cmr/elastic_utils/search/es_query_to_elastic.clj, elastic-utils-lib/src/cmr/elastic_utils/search/es_debug.clj, elastic-utils-lib/src/cmr/elastic_utils/search/es_index.clj
Replaces elastisch query helpers with direct Elasticsearch DSL maps ({:match_all {}}); updates search delegation to use new es_helper functions and refactors query option passing.
Error Handling & Index Utilities
elastic-utils-lib/src/cmr/elastic_utils/index_util.clj, indexer-app/src/cmr/indexer/data/elasticsearch.clj, indexer-app/src/cmr/indexer/data/index_set_elasticsearch.clj
Updates exception handling to extract and parse JSON error bodies from ExceptionInfo; changes try-elastic-operation to return structured {:error :status} maps instead of throwing; replaces elastisch index URL construction with manual URI concatenation.
Indexer App Data Layer
bootstrap-app/src/cmr/bootstrap/data/rebalance_util.clj, indexer-app/src/cmr/indexer/data/collection_granule_aggregation_cache.clj
Replaces elastisch query builders with direct Elasticsearch DSL JSON; removes elastisch imports and dependencies.
Integration Tests
indexer-app/int-test/cmr/indexer/test/index_fields.clj, indexer-app/int-test/cmr/indexer/test/utility.clj, indexer-app/int-test/cmr/indexer/test/valid_data_crud_tests.clj, system-int-test/test/cmr/system_int_test/search/misc/autocomplete_collection_facet_search_test.clj
Switches test Elasticsearch connection setup from esr/connect to new cmr.elastic-utils.connect/try-connect; updates document operations to use new helpers and direct HTTP calls; migrates assertions to use es_index_helper functions.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested labels

hacktoberfest-accepted

Suggested reviewers

  • zimzoom
  • daniel-zamora
  • eereiter
  • jceaser

Poem

🐰 No more elastisch, just HTTP calls so clean,
Direct to the cluster, the fastest we've seen,
Connection records dance with clj-http grace,
JSON flows freely through cyberspace,
One less dependency, the codebase runs lean!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The pull request description is incomplete. Multiple required checklist items are unchecked, indicating incomplete work before submission. Check and complete the clj-kondo verification, confirm documentation changes, verify no new warnings, and address the additional cleanup checklist items before merging.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'CMR-11272: Replace Clojurewerkz' is concise and clearly identifies the main objective—replacing the Clojurewerkz library throughout the codebase with clj-http.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch CMR-11272

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@jmaeng72 jmaeng72 marked this pull request as draft April 30, 2026 17:27
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
indexer-app/src/cmr/indexer/data/index_set_elasticsearch.clj (1)

150-155: ⚠️ Potential issue | 🟠 Major

Merge connection HTTP options into the index deletion request.

Lines 151-155 make a raw client/delete call without merging the connection's HTTP options, inconsistent with how elasticsearch.clj handles the same operation. This omits configured timeouts, retry policies, and connection managers, reducing reliability during load.

Proposed fix
    (when (esi-helper/exists? conn index-name)
      (let [admin-token (:admin-token config)
+           http-opts (:http-opts conn)
            response (client/delete (str (:uri conn) "/" index-name)
-                                   {:headers {"Authorization" admin-token
-                                              "Confirm-delete-action" "true"
-                                              :client-id t-config/cmr-client-id}
-                                    :throw-exceptions false})
+                                   (merge http-opts
+                                          {:headers (merge (:headers http-opts)
+                                                           {"Authorization" admin-token
+                                                            "Confirm-delete-action" "true"
+                                                            :client-id t-config/cmr-client-id})
+                                           :throw-exceptions false}))
            status (:status response)]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@indexer-app/src/cmr/indexer/data/index_set_elasticsearch.clj` around lines
150 - 155, The delete call is missing the connection's HTTP options so
timeouts/retries/connection managers are not applied; update the client/delete
invocation in index_set_elasticsearch.clj to merge (:http-options conn) into the
request map (while keeping :headers with admin-token, "Confirm-delete-action",
:client-id and :throw-exceptions false) so the final request combines
connection-level options with the per-call options—mirror the pattern used in
elasticsearch.clj when calling client/delete and ensure you reference conn,
:http-options, index-name and client/delete.
🧹 Nitpick comments (2)
elastic-utils-lib/src/cmr/elastic_utils/es_helper.clj (2)

234-236: 💤 Low value

Consider using when-not for clarity.

The current if-not without an else branch relies on throw-service-error to never return. Using when-not would make the intent clearer that this is a guard clause.

Proposed refactor
-      (if-not index-found-in-description
-        (errors/throw-service-error :internal-error (format "Given task id %s is not tracking the given index %s because description in task [%s] did not include the index. Mismatch on task id with index error." reindex-task-id index description)))
+      (when-not index-found-in-description
+        (errors/throw-service-error :internal-error (format "Given task id %s is not tracking the given index %s because description in task [%s] did not include the index. Mismatch on task id with index error." reindex-task-id index description)))
       full-status)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@elastic-utils-lib/src/cmr/elastic_utils/es_helper.clj` around lines 234 -
236, Replace the guard written with if-not by using when-not to make intent
explicit: in the block that checks index-found-in-description and calls
errors/throw-service-error with the formatted message (which currently mentions
reindex-task-id, index, and description), change the form "(if-not
index-found-in-description (errors/throw-service-error ...))" to "(when-not
index-found-in-description (errors/throw-service-error ...))" so the code
clearly expresses a guard/early-exit before returning full-status.

110-137: 💤 Low value

No status validation for put and delete responses.

Both functions have :throw-exceptions false but return the decoded response regardless of HTTP status. On write failures (e.g., version conflicts, mapping errors), callers will receive an error body rather than having an exception thrown. If this matches the old Elastisch behavior, consider adding a code comment to document this intentional design choice.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@elastic-utils-lib/src/cmr/elastic_utils/es_helper.clj` around lines 110 -
137, The put and delete functions currently return decoded responses even on
non-2xx HTTP status because they set :throw-exceptions false; update both
functions (put and delete) to validate the HTTP response status after the http
call: if the response status is not in the 200–299 range, raise/throw an
informative exception (e.g., ex-info) that includes the status and response body
so callers get a clear failure instead of an error payload, otherwise return the
decoded response; alternatively, if this behavior is intentional to preserve
legacy Elastisch semantics, add a clear code comment inside put and delete
explaining that non-2xx responses are intentionally returned (and why) so
reviewers understand it is deliberate.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@elastic-utils-lib/src/cmr/elastic_utils/connect.clj`:
- Around line 62-78: get-elastic-health currently swallows exceptions and
returns a string on failure which makes wait-for-healthy-elastic (which only
checks :timed_out) assume success; change the catch in get-elastic-health to
propagate a structured error (e.g. call errors/internal-error! with a
descriptive message and the exception details or rethrow the exception) so
callers always get a failure instead of a plain string, and ensure
wait-for-healthy-elastic (which calls get-elastic-health) only treats a
successful map response as valid (i.e. validate the response is a map and
contains :timed_out) and raise an internal-error! when the health call fails or
returns an unexpected type.

In `@elastic-utils-lib/src/cmr/elastic_utils/es_helper.clj`:
- Around line 95-108: doc-get currently treats any non-404 status as a
successful document and decodes the body, which can make 500s look like missing
docs; update doc-get to explicitly branch on the HTTP status: if status is 200
(or 2xx) call decode-response and return the document, if status is 404 return
nil, otherwise surface the error to callers (e.g., throw an ex-info or return an
error map containing status and decoded body) so functions like get-document and
tests cannot mistake server errors for "not found"; use the existing
url-with-path and decode-response helpers when decoding the error body so the
payload is included in the raised error/returned error map.
- Around line 82-93: The scroll function currently posts without disabling
clj-http exceptions and without checking status; update the http/post call in
scroll to merge {:throw-exceptions false} into the request options (same pattern
used in search/count-query/delete-by-query), capture the returned response map
(e.g., resp), check the :status for non-2xx cases and return a structured error
response rather than letting clj-http throw, and pass that response into
decode-response so callers get consistent behavior; reference the scroll
function and the http/post call and mirror the status handling used by
search/count-query/delete-by-query.

In `@elastic-utils-lib/src/cmr/elastic_utils/es_index_helper.clj`:
- Around line 48-50: The current predicate treats any non-200 as "missing" by
returning false; update the logic around the client/head call (using
url-with-path, client/head, (:http-opts conn) and index-name) so that it returns
true for status 200, returns false only for status 404, and for any other status
throws or propagates an error (e.g. throw or ex-info with the response) so
transient 401/403/5xx failures are surfaced; apply the same change to the
analogous helper at the other block (lines ~121-130).

In `@indexer-app/src/cmr/indexer/data/elasticsearch.clj`:
- Around line 324-327: The error handler currently calls json/decode on any
string body (in the let binding that defines body/status/parsed-body) which will
throw for non-JSON responses; replace that direct json/decode call with the
existing safe parsing pattern used elsewhere (e.g., call the safe-parse-json
helper from transmit/http_helper.clj or wrap json/decode in a try/catch that
returns the original string on failure) so parsed-body never throws on plain
text/HTML; update the let where parsed-body is computed to use safe-parse-json
(or equivalent try/catch) instead of (json/decode body true).

---

Outside diff comments:
In `@indexer-app/src/cmr/indexer/data/index_set_elasticsearch.clj`:
- Around line 150-155: The delete call is missing the connection's HTTP options
so timeouts/retries/connection managers are not applied; update the
client/delete invocation in index_set_elasticsearch.clj to merge (:http-options
conn) into the request map (while keeping :headers with admin-token,
"Confirm-delete-action", :client-id and :throw-exceptions false) so the final
request combines connection-level options with the per-call options—mirror the
pattern used in elasticsearch.clj when calling client/delete and ensure you
reference conn, :http-options, index-name and client/delete.

---

Nitpick comments:
In `@elastic-utils-lib/src/cmr/elastic_utils/es_helper.clj`:
- Around line 234-236: Replace the guard written with if-not by using when-not
to make intent explicit: in the block that checks index-found-in-description and
calls errors/throw-service-error with the formatted message (which currently
mentions reindex-task-id, index, and description), change the form "(if-not
index-found-in-description (errors/throw-service-error ...))" to "(when-not
index-found-in-description (errors/throw-service-error ...))" so the code
clearly expresses a guard/early-exit before returning full-status.
- Around line 110-137: The put and delete functions currently return decoded
responses even on non-2xx HTTP status because they set :throw-exceptions false;
update both functions (put and delete) to validate the HTTP response status
after the http call: if the response status is not in the 200–299 range,
raise/throw an informative exception (e.g., ex-info) that includes the status
and response body so callers get a clear failure instead of an error payload,
otherwise return the decoded response; alternatively, if this behavior is
intentional to preserve legacy Elastisch semantics, add a clear code comment
inside put and delete explaining that non-2xx responses are intentionally
returned (and why) so reviewers understand it is deliberate.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: fe4bf198-cff5-4883-b81b-ee04103e29f5

📥 Commits

Reviewing files that changed from the base of the PR and between 565367d and 6eab2a0.

📒 Files selected for processing (17)
  • bootstrap-app/src/cmr/bootstrap/data/rebalance_util.clj
  • elastic-utils-lib/project.clj
  • elastic-utils-lib/src/cmr/elastic_utils/connect.clj
  • elastic-utils-lib/src/cmr/elastic_utils/es_helper.clj
  • elastic-utils-lib/src/cmr/elastic_utils/es_index_helper.clj
  • elastic-utils-lib/src/cmr/elastic_utils/index_util.clj
  • elastic-utils-lib/src/cmr/elastic_utils/search/es_debug.clj
  • elastic-utils-lib/src/cmr/elastic_utils/search/es_index.clj
  • elastic-utils-lib/src/cmr/elastic_utils/search/es_query_to_elastic.clj
  • elastic-utils-lib/src/cmr/elastic_utils/search/es_wrapper.clj
  • indexer-app/int-test/cmr/indexer/test/services/index_fields.clj
  • indexer-app/int-test/cmr/indexer/test/utility.clj
  • indexer-app/int-test/cmr/indexer/test/valid_data_crud_tests.clj
  • indexer-app/src/cmr/indexer/data/collection_granule_aggregation_cache.clj
  • indexer-app/src/cmr/indexer/data/elasticsearch.clj
  • indexer-app/src/cmr/indexer/data/index_set_elasticsearch.clj
  • system-int-test/test/cmr/system_int_test/search/misc/autocomplete_collection_facet_search_test.clj

Comment on lines +62 to 78
(try
(let [url (str (:uri conn) "/_cluster/health")
response (client/get url (merge (:http-opts conn)
{:query-params params
:accept :json}))]
(json/decode (:body response) true))
(catch Exception e
(format "Unable to get elasticsearch cluster health, caught exception: %s"
(.getMessage e))))))

(defn wait-for-healthy-elastic
"Waits for the elasticsearch cluster health to reach yellow. Pass in a elasticsearch store that
has a :conn key with the elastisch connection"
[elastic-store]
(when (:timed_out (admin/cluster-health
(:conn elastic-store) {:wait_for_status "yellow" :timeout "3s"}))
(when (:timed_out (get-elastic-health (:conn elastic-store)
{:wait_for_status "yellow" :timeout "3s"}))
(errors/internal-error! "Timed out waiting for elasticsearch to reach a healthy state")))
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don’t swallow cluster health failures during startup.

Lines 68-70 return a string on request failure, and Lines 76-78 only inspect :timed_out. If the health call fails entirely, wait-for-healthy-elastic will treat that as success and continue with an unavailable cluster.

Suggested fix
 (defn wait-for-healthy-elastic
   "Waits for the elasticsearch cluster health to reach yellow. Pass in a elasticsearch store that
   has a :conn key with the elastisch connection"
   [elastic-store]
-  (when (:timed_out (get-elastic-health (:conn elastic-store)
-                                        {:wait_for_status "yellow" :timeout "3s"}))
-    (errors/internal-error! "Timed out waiting for elasticsearch to reach a healthy state")))
+  (let [health (get-elastic-health (:conn elastic-store)
+                                   {:wait_for_status "yellow" :timeout "3s"})]
+    (cond
+      (string? health)
+      (errors/internal-error! health)
+
+      (:timed_out health)
+      (errors/internal-error! "Timed out waiting for elasticsearch to reach a healthy state"))))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@elastic-utils-lib/src/cmr/elastic_utils/connect.clj` around lines 62 - 78,
get-elastic-health currently swallows exceptions and returns a string on failure
which makes wait-for-healthy-elastic (which only checks :timed_out) assume
success; change the catch in get-elastic-health to propagate a structured error
(e.g. call errors/internal-error! with a descriptive message and the exception
details or rethrow the exception) so callers always get a failure instead of a
plain string, and ensure wait-for-healthy-elastic (which calls
get-elastic-health) only treats a successful map response as valid (i.e.
validate the response is a map and contains :timed_out) and raise an
internal-error! when the health call fails or returns an unexpected type.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code rabbit found an existing issue, not something that you created. Do we care about this? Is this just used for testing or really connecting upon startup? I see, this is called from reset. Not sure that we really care here.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find all usages of wait-for-healthy-elastic
rg -n "wait-for-healthy-elastic" --type clj -C3

Length of output: 28


🏁 Script executed:

#!/bin/bash
# Find all usages of wait-for-healthy-elastic
rg -n "wait-for-healthy-elastic" -C3

Length of output: 2271


🏁 Script executed:

#!/bin/bash
# Check the index_util.clj context around the call
rg -n "wait-for-healthy-elastic" -A5 -B10 elastic-utils-lib/src/cmr/elastic_utils/index_util.clj

# Also check dev-system control.clj broader context
rg -n "reset" -C5 dev-system/src/cmr/dev_system/control.clj | head -60

Length of output: 3391


@eereiter, good question — it's actually called in both places:

  1. Dev/test pathdev-system/control.clj Lines 122–123, inside the POST /reset handler used during integration testing. You're right that this is relatively low-risk: if ES is unreachable the tests will fail regardless, and an incorrect 200 from /reset would be obvious.

  2. Production/startup pathelastic-utils-lib/src/cmr/elastic_utils/index_util.clj Line 181, called after creating a new index during index initialization. This one is more meaningful — if the health check silently swallows an exception here, the code continues to set up aliases and refresh the index even if ES is in a bad state at that moment.

So the bug does touch production-relevant code via index_util.clj. Whether it's worth fixing now depends on how likely a health-check failure is in practice right after a successful create call. If the team is comfortable with the current behavior (pre-existing risk), it can be deferred. But if you want a quick guard, the suggested fix in my earlier comment would cover both call sites cleanly.

Happy to leave this as a follow-up issue if you'd prefer not to address it in this PR.

Comment on lines 82 to +93
(defn scroll
"Performs a scroll query, fetching the next page of results from a query given a scroll id"
[conn scroll-id opts]
(doc/scroll conn scroll-id opts))
(let [url (url-with-path conn "_search" "scroll")
body (merge {:scroll_id scroll-id}
(select-keys opts [:scroll]))]
(decode-response
(http/post url
(merge (:http-opts conn)
{:content-type :json
:body (json/generate-string body)
:accept :json})))))
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Missing error handling for scroll requests.

Unlike search, count-query, and delete-by-query, this function lacks :throw-exceptions false and status checking. If ES returns a non-2xx status (e.g., 404 for expired scroll), clj-http will throw an exception rather than returning a structured error response. This inconsistency may cause different error behavior than callers expect.

Proposed fix to add consistent error handling
 (defn scroll
   "Performs a scroll query, fetching the next page of results from a query given a scroll id"
   [conn scroll-id opts]
   (let [url (url-with-path conn "_search" "scroll")
         body (merge {:scroll_id scroll-id}
-                    (select-keys opts [:scroll]))]
-    (decode-response
-     (http/post url
-                (merge (:http-opts conn)
-                       {:content-type :json
-                        :body (json/generate-string body)
-                        :accept :json})))))
+                    (select-keys opts [:scroll]))
+        response (http/post url
+                            (merge (:http-opts conn)
+                                   {:content-type :json
+                                    :body (json/generate-string body)
+                                    :accept :json
+                                    :throw-exceptions false}))
+        status (:status response)]
+    (if (some #{status} [200 201])
+      (decode-response response)
+      (throw (ex-info (str "Scroll failed with status " status)
+                      {:status status :body (:body response)})))))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@elastic-utils-lib/src/cmr/elastic_utils/es_helper.clj` around lines 82 - 93,
The scroll function currently posts without disabling clj-http exceptions and
without checking status; update the http/post call in scroll to merge
{:throw-exceptions false} into the request options (same pattern used in
search/count-query/delete-by-query), capture the returned response map (e.g.,
resp), check the :status for non-2xx cases and return a structured error
response rather than letting clj-http throw, and pass that response into
decode-response so callers get consistent behavior; reference the scroll
function and the http/post call and mirror the status handling used by
search/count-query/delete-by-query.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will make the change

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmaeng72, sounds good! Once the change is in, the scroll function will have consistent error handling aligned with search, count-query, and delete-by-query. The proposed fix in the review comment should serve as a reference when you make the update.

Comment on lines +95 to +108
(defn doc-get
"Fetches and returns a document by id or `nil` if it does not exist."
([conn index mapping-type id]
(doc-get conn index mapping-type id nil))
([conn index _mapping-type id opts]
(let [result (if (empty? opts)
(rest/get conn (rest/record-url conn index "_doc" id))
(rest/get conn (rest/record-url conn index "_doc" id) {:query-params opts}))]
(when-not (not-found? result)
result))))
(let [url (url-with-path conn index "_doc" id)
response (http/get url
(merge (:http-opts conn)
{:query-params opts
:accept :json
:throw-exceptions false}))
status (:status response)]
(when-not (= 404 status)
(decode-response response)))))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Error statuses other than 404 are not handled.

The function returns nil on 404 and decodes the response for all other statuses. If Elasticsearch returns a 500 or other error status, the error body will be returned as if it were a valid document. Callers (e.g., get-document in elasticsearch.clj and tests checking the :found field) may silently misinterpret server errors as "document not found."

Proposed fix to handle error statuses
 (defn doc-get
   "Fetches and returns a document by id or `nil` if it does not exist."
   ([conn index mapping-type id]
    (doc-get conn index mapping-type id nil))
   ([conn index _mapping-type id opts]
    (let [url (url-with-path conn index "_doc" id)
          response (http/get url
                             (merge (:http-opts conn)
                                    {:query-params opts
                                     :accept :json
                                     :throw-exceptions false}))
          status (:status response)]
-     (when-not (= 404 status)
-       (decode-response response)))))
+     (cond
+       (= 404 status) nil
+       (< status 300) (decode-response response)
+       :else (throw (ex-info (str "Document get failed with status " status)
+                             {:status status :body (:body response)}))))))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@elastic-utils-lib/src/cmr/elastic_utils/es_helper.clj` around lines 95 - 108,
doc-get currently treats any non-404 status as a successful document and decodes
the body, which can make 500s look like missing docs; update doc-get to
explicitly branch on the HTTP status: if status is 200 (or 2xx) call
decode-response and return the document, if status is 404 return nil, otherwise
surface the error to callers (e.g., throw an ex-info or return an error map
containing status and decoded body) so functions like get-document and tests
cannot mistake server errors for "not found"; use the existing url-with-path and
decode-response helpers when decoding the error body so the payload is included
in the raised error/returned error map.

Comment on lines +48 to +50
(let [url (url-with-path conn index-name)
response (client/head url (merge (:http-opts conn) {:throw-exceptions false}))]
(= 200 (:status response))))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Only treat 404 as “missing”.

These helpers currently convert any non-200 response into false/[], so 401/403/5xx failures look the same as “index or alias does not exist”. Callers use these predicates to decide whether to create indexes and aliases, so a transient ES failure can trigger the wrong mutating path instead of surfacing the error.

Suggested fix
 (defn exists?
   "Return true if the given index exists"
   [conn index-name]
   (let [url (url-with-path conn index-name)
-        response (client/head url (merge (:http-opts conn) {:throw-exceptions false}))]
-    (= 200 (:status response))))
+        response (client/head url (merge (:http-opts conn) {:throw-exceptions false}))
+        status (:status response)]
+    (case status
+      200 true
+      404 false
+      (throw (ex-info (str "Index exists check failed with status " status)
+                      {:status status :body (:body response)})))))

 (defn get-aliases
   "Get index aliases"
   [conn index-name]
   (let [url (url-with-path conn index-name "_alias")
         response (client/get url (merge (:http-opts conn)
                                         {:accept :json
                                          :throw-exceptions false}))
         status (:status response)]
-    (if (= 404 status)
-      []
-      (let [resp (decode-response response)
-            aliases (keys (get-in resp [(keyword index-name) :aliases]))]
-        (mapv name aliases)))))
+    (case status
+      200 (let [resp (decode-response response)
+                aliases (keys (get-in resp [(keyword index-name) :aliases]))]
+            (mapv name aliases))
+      404 []
+      (throw (ex-info (str "Get aliases failed with status " status)
+                      {:status status :body (:body response)})))))

Also applies to: 121-130

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@elastic-utils-lib/src/cmr/elastic_utils/es_index_helper.clj` around lines 48
- 50, The current predicate treats any non-200 as "missing" by returning false;
update the logic around the client/head call (using url-with-path, client/head,
(:http-opts conn) and index-name) so that it returns true for status 200,
returns false only for status 404, and for any other status throws or propagates
an error (e.g. throw or ex-info with the response) so transient 401/403/5xx
failures are surfaced; apply the same change to the analogous helper at the
other block (lines ~121-130).

Comment on lines +324 to +327
(let [body (get-in (ex-data e) [:body])
status (:status (ex-data e))
parsed-body (if (string? body) (json/decode body true) body)]
{:error parsed-body :status status})))))
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, locate and examine the elasticsearch.clj file
find . -name "elasticsearch.clj" -type f

Repository: nasa/Common-Metadata-Repository

Length of output: 187


🏁 Script executed:

# Read the file to see the actual code at lines 324-327 and surrounding context
cat -n indexer-app/src/cmr/indexer/data/elasticsearch.clj | sed -n '310,340p'

Repository: nasa/Common-Metadata-Repository

Length of output: 1738


🏁 Script executed:

# Check what json library is imported in this file
grep -n "require\|import" indexer-app/src/cmr/indexer/data/elasticsearch.clj | head -20

Repository: nasa/Common-Metadata-Repository

Length of output: 882


🏁 Script executed:

# Get the beginning of the file to see the requires/imports
head -50 indexer-app/src/cmr/indexer/data/elasticsearch.clj

Repository: nasa/Common-Metadata-Repository

Length of output: 2035


🏁 Script executed:

# Search specifically for json library usage and import
grep -n "json" indexer-app/src/cmr/indexer/data/elasticsearch.clj

Repository: nasa/Common-Metadata-Repository

Length of output: 267


🏁 Script executed:

# Look for other usages of json/decode in this file to see if there's error handling elsewhere
grep -B 3 -A 3 "json/decode\|json/encode" indexer-app/src/cmr/indexer/data/elasticsearch.clj

Repository: nasa/Common-Metadata-Repository

Length of output: 362


🏁 Script executed:

# Check the test file for exception handling tests
cat indexer-app/test/cmr/indexer/test/data/elasticsearch.clj

Repository: nasa/Common-Metadata-Repository

Length of output: 10884


🏁 Script executed:

# Look for other exception handling patterns in the codebase involving json/decode
rg -A 5 -B 5 "json/decode" indexer-app/src/cmr/indexer/

Repository: nasa/Common-Metadata-Repository

Length of output: 2196


🏁 Script executed:

# Search for any existing error handling or try-catch patterns around json operations in the broader codebase
rg -A 3 -B 3 "json/decode.*catch|catch.*json/decode" .

Repository: nasa/Common-Metadata-Repository

Length of output: 57


🏁 Script executed:

# Check if there are any comments or documentation about exception handling with elasticsearch in the file
grep -B 5 -A 5 "ExceptionInfo\|exception\|error" indexer-app/src/cmr/indexer/data/elasticsearch.clj | head -50

Repository: nasa/Common-Metadata-Repository

Length of output: 1971


🏁 Script executed:

# Check what cheshire.core json/decode does with invalid JSON - search for any handling in the codebase
rg "json/decode" -A 3 -B 3 . | grep -A 5 -B 5 "catch\|try"

Repository: nasa/Common-Metadata-Repository

Length of output: 16464


🏁 Script executed:

# Look for documentation or comments about json/decode behavior
rg -i "cheshire\|json.*decode" . --type md --type txt

Repository: nasa/Common-Metadata-Repository

Length of output: 57


🏁 Script executed:

# Check the clj-http dependency since exceptions might come from there
grep -r "clj-http" indexer-app/project.clj

Repository: nasa/Common-Metadata-Repository

Length of output: 57


🏁 Script executed:

# Look at the elastic-utils pattern more closely - it has the same code!
cat -n elastic-utils-lib/src/cmr/elastic_utils/index_util.clj | sed -n '1,100p'

Repository: nasa/Common-Metadata-Repository

Length of output: 4182


🏁 Script executed:

# Check if there are any integration tests showing non-JSON error responses from Elasticsearch
rg -i "elasticsearch.*error\|es.*error\|html\|text/plain" indexer-app/test/ -A 3 -B 3

Repository: nasa/Common-Metadata-Repository

Length of output: 57


🏁 Script executed:

# Search for documentation or comments about error response formats
rg "body.*string\|response.*format\|non-json\|plain text" . -i

Repository: nasa/Common-Metadata-Repository

Length of output: 57


🏁 Script executed:

# Find the exact location of the json/decode in elastic-utils index_util.clj
rg -n "parsed-body.*json/decode" elastic-utils-lib/src/cmr/elastic_utils/index_util.clj -B 5 -A 5

Repository: nasa/Common-Metadata-Repository

Length of output: 571


🏁 Script executed:

# Look at the broader context of that function
grep -n "catch\|try\|defn" elastic-utils-lib/src/cmr/elastic_utils/index_util.clj | tail -20

Repository: nasa/Common-Metadata-Repository

Length of output: 619


🏁 Script executed:

# Get the complete function containing the json/decode in index_util.clj
sed -n '130,160p' elastic-utils-lib/src/cmr/elastic_utils/index_util.clj

Repository: nasa/Common-Metadata-Repository

Length of output: 1539


🏁 Script executed:

# Check if cheshire's json/decode throws on invalid JSON
rg -A 10 "cheshire" . --type md | head -30

Repository: nasa/Common-Metadata-Repository

Length of output: 57


🏁 Script executed:

# Look at what kind of responses clj-http can return in the body when there's an error
rg "clj-http\|ExceptionInfo" indexer-app/src/cmr/indexer/data/elasticsearch.clj -B 3 -A 3

Repository: nasa/Common-Metadata-Repository

Length of output: 57


🏁 Script executed:

# Search for any comments or documentation about what bodies might look like
grep -r "HTML\|text/plain\|content-type\|response.*format" indexer-app/src/cmr/indexer/data/elasticsearch.clj

Repository: nasa/Common-Metadata-Repository

Length of output: 57


🏁 Script executed:

# Let me check what cheshire documentation says and look for any evidence of json/decode behavior
# First, let's see if there are comments or code showing what happens with malformed JSON
rg "JsonParseException\|ParseException\|json.*error\|decode.*error" . -i --type java --type clj | head -20

Repository: nasa/Common-Metadata-Repository

Length of output: 103


🏁 Script executed:

# Check how json/decode is being called in transmit/http_helper.clj since it already has the pattern
cat -n transmit-lib/src/cmr/transmit/http_helper.clj | sed -n '1,50p'

Repository: nasa/Common-Metadata-Repository

Length of output: 2008


🏁 Script executed:

# Look at the specific pattern used in search_util.clj more carefully
grep -n "json/decode" system-int-test/src/cmr/system_int_test/utils/search_util.clj -B 5 -A 5

Repository: nasa/Common-Metadata-Repository

Length of output: 5175


🏁 Script executed:

# Check if there are any tests for the try-elastic-operation function with non-JSON bodies
find . -path "*/test*" -name "*.clj" -type f -exec grep -l "try-elastic-operation\|try_elastic_operation" {} \;

Repository: nasa/Common-Metadata-Repository

Length of output: 57


🏁 Script executed:

# Look for any real examples of ExceptionInfo being thrown with non-JSON bodies
rg "ExceptionInfo.*body\|ex-info.*:body" . -B 3 -A 3 | head -50

Repository: nasa/Common-Metadata-Repository

Length of output: 57


Wrap json/decode to handle non-JSON error response bodies.

Line 326 assumes any string body is valid JSON. If Elasticsearch returns plain text, HTML, or other non-JSON content, json/decode throws and breaks the error handling path. This pattern is also present in elastic-utils-lib/src/cmr/elastic_utils/index_util.clj (lines 195-198, in a macro).

The codebase already handles this correctly elsewhere (e.g., transmit-lib/src/cmr/transmit/http_helper.clj with safe-parse-json). Apply the same pattern here.

Proposed fix
       (catch clojure.lang.ExceptionInfo e
         (let [body (get-in (ex-data e) [:body])
               status (:status (ex-data e))
-              parsed-body (if (string? body) (json/decode body true) body)]
+              parsed-body (if (string? body)
+                            (try
+                              (json/decode body true)
+                              (catch Exception _
+                                body))
+                            body)]
           {:error parsed-body :status status})))))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@indexer-app/src/cmr/indexer/data/elasticsearch.clj` around lines 324 - 327,
The error handler currently calls json/decode on any string body (in the let
binding that defines body/status/parsed-body) which will throw for non-JSON
responses; replace that direct json/decode call with the existing safe parsing
pattern used elsewhere (e.g., call the safe-parse-json helper from
transmit/http_helper.clj or wrap json/decode in a try/catch that returns the
original string on failure) so parsed-body never throws on plain text/HTML;
update the let where parsed-body is computed to use safe-parse-json (or
equivalent try/catch) instead of (json/decode body true).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix this.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

@eereiter, I'll apply the safe-parsing fix to both affected files now.

  • Fix json/decode in indexer-app/src/cmr/indexer/data/elasticsearch.clj to wrap it in try/catch
  • Fix the same unsafe pattern in elastic-utils-lib/src/cmr/elastic_utils/index_util.clj

🏁 Script executed:

#!/bin/bash
# Get the context around the json/decode in index_util.clj
sed -n '185,210p' elastic-utils-lib/src/cmr/elastic_utils/index_util.clj

Length of output: 1160


@eereiter, here are the exact diffs for both files:

1. indexer-app/src/cmr/indexer/data/elasticsearch.clj

       (catch clojure.lang.ExceptionInfo e
         (let [body (get-in (ex-data e) [:body])
               status (:status (ex-data e))
-              parsed-body (if (string? body) (json/decode body true) body)]
+              parsed-body (if (string? body)
+                            (try
+                              (json/decode body true)
+                              (catch Exception _
+                                body))
+                            body)]
           {:error parsed-body :status status})))))

2. elastic-utils-lib/src/cmr/elastic_utils/index_util.clj (macro, uses gensym'd names)

     (catch clojure.lang.ExceptionInfo e#
       (let [body# (get-in (ex-data e#) [:body])
             status# (:status (ex-data e#))
-            parsed-body# (if (string? body#) (json/decode body# true) body#)]
+            parsed-body# (if (string? body#)
+                           (try
+                             (json/decode body# true)
+                             (catch Exception _
+                               body#))
+                           body#)]
         {:error parsed-body# :status status#}))))

Both changes wrap the json/decode call in a try/catch that falls back to the raw string on any parse failure, so plain-text or HTML error responses from Elasticsearch no longer break the error-handling path.


The requested changes have been implemented and a pull request has been created: View PR

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I didn't mean for code rabbit to fix it. My comment was meant for Jyna to look at it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hahahah i love that it responded to you with a rocket emoji

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 30, 2026

Codecov Report

❌ Patch coverage is 4.04624% with 166 lines in your changes missing coverage. Please review.
✅ Project coverage is 56.51%. Comparing base (565367d) to head (01ae7a5).

Files with missing lines Patch % Lines
...stic-utils-lib/src/cmr/elastic_utils/es_helper.clj 0.00% 76 Missing ⚠️
...tils-lib/src/cmr/elastic_utils/es_index_helper.clj 0.00% 45 Missing ⚠️
...lastic-utils-lib/src/cmr/elastic_utils/es_util.clj 26.31% 14 Missing ⚠️
...lastic-utils-lib/src/cmr/elastic_utils/connect.clj 14.28% 12 Missing ⚠️
...tils-lib/src/cmr/elastic_utils/search/es_debug.clj 0.00% 5 Missing ⚠️
indexer-app/src/cmr/indexer/data/elasticsearch.clj 0.00% 4 Missing ⚠️
...trap-app/src/cmr/bootstrap/data/rebalance_util.clj 0.00% 2 Missing ⚠️
...tils-lib/src/cmr/elastic_utils/search/es_index.clj 0.00% 2 Missing ⚠️
...ls-lib/src/cmr/elastic_utils/search/es_wrapper.clj 0.00% 2 Missing ⚠️
...exer/data/collection_granule_aggregation_cache.clj 0.00% 2 Missing ⚠️
... and 2 more
Additional details and impacted files
@@                    Coverage Diff                    @@
##           CMR-10369-es8-feature    #2418      +/-   ##
=========================================================
- Coverage                  56.55%   56.51%   -0.05%     
=========================================================
  Files                       1068     1069       +1     
  Lines                      73580    73646      +66     
  Branches                    2057     2057              
=========================================================
+ Hits                       41613    41618       +5     
- Misses                     30051    30112      +61     
  Partials                    1916     1916              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@jmaeng72 jmaeng72 marked this pull request as ready for review May 1, 2026 13:36
[nasa-cmr/cmr-common-lib "0.1.1-SNAPSHOT"]
[nasa-cmr/cmr-transmit-lib "0.1.0-SNAPSHOT"]
[org.clojure/clojure]
[ring/ring-jetty-adapter "1.14.2"]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you not going to have to do some merge with your other change for this?

(esri/refresh (context->conn context es-config/elastic-name))
(esri/refresh (context->conn context es-config/gran-elastic-name)))
(esi-helper/refresh (context->conn context es-config/elastic-name) nil)
(esi-helper/refresh (context->conn context es-config/gran-elastic-name) nil))
Copy link
Copy Markdown
Contributor

@eereiter eereiter May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nil in these two calls refreshes all of the indexes? That is what I think is happening - I just want to confirm.

(es-util/decode-response
(client/get url (merge (:http-opts conn)
{:accept :json
:throw-exceptions false}))))) No newline at end of file
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a new line here.

(map join-names)
(filter identity)
(string/join "/"))]
(str (:uri conn) "/" path))) No newline at end of file
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

newline

(es-util/decode-response
(http/post url
(merge (:http-opts conn)
{:body (-> (map json/encode operations)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explain in a comment what we are doing with the \n and why es needs it?

(when (esi-helper/exists? conn index-name)
(let [admin-token (:admin-token config)
response (client/delete (esr/index-url conn index-name)
response (client/delete (str (:uri conn) "/" index-name)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have a helper for this?

documents (map #(esd/create conn "1_autocomplete" "_doc" %) test-values)]
(let [root (url/elastic-root es-config/elastic-name)
url (str root "/1_autocomplete/_doc")
documents (map #(client/post url {:body (json/generate-string %)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use mapv here, map is lazy and mapv is eager

"Performs a bulk operation"
([conn operations] (bulk conn operations nil))
([conn operations params]
(when (not-empty operations)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(when (seq ...

(defn decode-response
"Decodes the response body from the given response"
[response]
(-> response
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(parse-safely (:body response))) is more succinct.

Copy link
Copy Markdown
Contributor

@eereiter eereiter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just fix the code rabbit finding in indexer-app/src/cmr/indexer/data/elasticsearch.clj

@daniel-zamora
Copy link
Copy Markdown
Contributor

common-lib/src/cmr/common/data/bulk.clj:
  2    "Implementation of elastisch bulk index preparation that supports version and version_type.
  3:   The original implementation is in clojurewerkz.elastisch.rest.bulk namespace.")
  4  

indexer-app/src/cmr/indexer/data/bulk.clj:
  2    "Implementation of elastisch bulk index preparation that supports version and version_type.
  3:   The origninal implementation is in clojurewerkz.elastisch.rest.bulk namespace.")
  4  

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants